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Abstract. In this series of lectures we introduce the Monge-Kantorovich problem 
of optimally transporting one distribution of mass onto another, where optimality 
is measured against a cost function c(x,y). Connections to geometry, inequalities, 
and partial differential equations will be discussed, focusing in particular on recent 
developments in the regularity theory for Monge-Ampere type equations. An ap- 
plication to microeconomics will also be described, which amounts to finding the 
equilibrium price distribution for a monopolist marketing a multidimensional line 
of products to a population of anonymous agents whose preferences are known only 
statistically. ©2010 by Robert J. McCann. All rights reserved. 
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Preamble 



This survey is based on a series of five lectures by Robert McCann (of the University 
of Toronto), delivered at a summer school on "New Vistas in Image Processing and 
Partial Differential Equations" organized 7-12 June 2010 by Irene Fonseca, Giovanni 
Leoni, and Dejan Slepcev of Carnegie Mellon University on behalf of the Center for 
Nonlinear Analysis there. The starting point for the manuscript which emerged was a 
detailed set of notes taken during those lectures by Nestor Guillen (University of Texas 
at Austin). 

These notes are intended to convey a flavor for the subject, without getting bogged 
down in too many technical details. Part of the discussion is therefore impression- 
istic, and some of the results are stated under the tacit requirement that the sup- 
ports of the measures /x* be compact, with the understanding that they extend to 
non-compactly supported measures under appropriate hypotheses [93] [59] [45] [46] 
concerning the behaviour near infinity of the measures and the costs. The choice of 
topics to be covered in a series of lectures is necessarily idiosyncratic. General ref- 
erences for these and other topics include papers of the first author posted on the 
website www.math.toronto.edu/mccann and the two books by Villani |138j |139j . Ear- 
lier surveys include the ones by Ambrosio [4], Evans [43], Urbas |136j and Rachev and 
Riischendorf |115j . Many detailed references to the literature may be found there, to 
augment the bibliography of selected works included below. 



1.1. Monge-Kantorovich problem: transporting ore from mines to factories. 

The problem to be discussed can be caricatured as follows: imagine we have a distribu- 
tion of iron mines across the countryside, producing a total of 1000 tonnes of iron ore 
weekly, and a distribution of factories across the countryside that consume a total of 
1000 tonnes of iron ore weekly. Knowing the cost c(x, y) per ton of ore transported from 
a mine at a; to a factory at y, the problem is to decide which mines should be supplying 
which factories so as to minimize the total transportation costs. 

To model this problem mathematically, let the triples (M^ , d , ^ ) denote two com- 
plete separable metric spaces M — also called Polish spaces — equipped with distance 
functions d and Borel reference measures oo ± . These two metric spaces will represent 
the landscapes containing the mines and the factories. They will often be assumed 
to be geodesic spaces, and/or to coincide. Here a geodesic space M(= M ± ) refers 
to a metric space in which every pair of points Xq,x\ £ M is connected by a curve 
s G [0, 1] — > x s e M satisfying 

(1) d(xQ, x s ) = sd(xo, x\) and d{x s ,xi) = (1 — s)d(xo, X\) V s e [0,1]. 
Such a curve is called a geodesic segment. 
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e.g. 1) Euclidean space: M — R™, d(x,y) — \x — y\, u> = Vol = H n = Hausdorff 
n-dimensional measure, geodesic segments take the form x a = (1 — s)xq + sx\. 

e.g. 2) Complete Riemannian manifold (M = M ± ,gij), with or without boundary: 
duj = dVol = dH n = (dct g lj ) 1 / 2 d n x, 

1 1 f 1 

~d (x ,Xi) = inf - / (x s ,x s ) g{Xs) ds. 

Z {x a \x(0)=x o ,x(l)=xi} Z Jq 

A minimizing curve s £ [0, 1] i — > x s £ M exists by the Hopf-Rinow theorem; it satisfies 
([I]), and is called a Riemannian geodesic. 

The distributions of mines and factories will be modeled by Borel probability mea- 
sures ji + on M + and fi~ on M~ , respectively. Any Borel map G : M + — > M~ defines 
an image or pushed-forward measure v — G #) u + on M~ by 

(2) (G#» + )[V]:=H + [G- l {V)] VFCM- 

A central problem in optimal transportation is to find, among all maps G : M + — > M~ 
pushing /i + forward to fi~~ , one which minimizes the total cost 



(3) 



cost(G)= / c(x,G(x))dn + (x) 



This problem was first proposed by Monge in 1781, taking the Euclidean distance 
c(x,y) — \x — y\ as his cost function [104] . For more generic costs, some basic math- 
ematical issues such as the existence, uniqueness, and mathematical structure of the 
optimizers are addressed in the second lecture below. However nonlinearity of the ob- 
jective functional and a lack of compactness or convexity for its domain make Monge's 
formulation of the problem difficult to work with. One and a half centuries later, Kan- 
torovich's relaxation of the problem to an (infinite-dimesional) linear program provided 
a revolutionary tool [57] [5S] . 

1.2. Wasserstein distance and geometric applications. The minimal cost of trans- 
port between fi + and [i~ associated to c(-, •) will be provisionally denoted by 

(4) W c (ji + ,fi-)= inf cost(G). 

It can be thought of as quantifying the discrepancy between [i + and /i - , and is more 



properly defined using Kantorovich's formulation ( 11 ), though we shall eventually show 
the two definitions coincide in many cases of interest. When M — M , the costs 
c(x,y) = d p (x,y) with < p < 1 occur naturally in economics and operations research, 
where it is often the case that there is an economy of scale for long trips [96] . In this 
case, the quantity W c (fi + , defines a metric on the space V{M) of Borel probability 
measures on M . For p > 1 on the other hand, it is necessary to extract a p-th root to 
obtain a metric 

(5) d p (p+,^) :=W c (tx+, u-) 1 ^ 

on V(M) which satisfies the triangle inequality. 

Though the initial references dealt specifically with the case p = 1 [5S] |142j , the whole 
family of distances are now called Kantorovich- Rubinstein or Wasserstein metrics [41] . 
Apart from the interesting exception of the limiting case doo = lim^oo d p [98] [26] , on a 
compact metric space M all these metrics give rise to the same topology, namely weak-* 
convergence. For non-compact M, the d p topologies differ from each other only in the 
number of moments of a sequence of measures which are required to converge. Moreover, 
(V(M),d p ) inherits geometric properties from (M,d), such as being a geodesic space. 
Notions such as Ricci curvature in the underlying space (M , d) can be characterized by 
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the geodesic convexity first explored in [94] of certain functionals on the larger space 
V(M) — such as Boltzmann's entropy. One direction of this equivalence was proved 
by Cordero-Erausquin, McCann, and Schmuckenschager [3D] and Otto and Villani |110j 
in projects which were initially independent (see also |28j and [31)). while the converse 
was established by von Renesse and Sturm [116j confirming the formal arguments of 
[110] . This equivalence forms the basis of Lott- Villani and Sturm's definition of lower 
bounds for Ricci curvature in the metric measure space setting, without reference to any 
underlying Riemannian structure [38] |127j . McCann and Topping used a similar idea to 
characterize the Ricci flow [5pJ, which led Lott [57] and Topping |130| to simpler proofs 
of Perelman's celebrated monotonicity results [112] , Despite the interest of these recent 
developments, we shall not pursue them farther in these lectures, apart from sketching 
a transportation-based proof of the isoperimetric theorem whose ideas underpin many 
such geometric connections. 

1.3. Brenier's theorem and convex gradients. It turns out that Monge's cost 
c(x,y) = \x — y\ is among the hardest to deal with, due to its lack of strict con- 
vexity. For this cost, the minimizer of Q is not generally unique, even on the line 
M = R. Existence of solutions is tricky to establish: the first 'proof, due to Sudakov 
[128] , relied on an unsubstantiated claim which turned out to be correct only in the 
plane M ± = R 2 higher dimensional arguments were given increasing generality 
by Evans-Gangbo [H], and then Ambrosio [J], Caffarelli-Feldman-McCann [21], and 
Trudinger-Wang |132j independently. Simpler approaches were proposed by Champion- 
DePascale [35] and Bianchini-Cavalletti [TT] more recently. 

The situation for the quadratic cost c(x, y) = \x — y\ 2 is much simpler, mirroring the 
relative simplicity of the Hilbert geometry of L 2 among Banach spaces L p with p > 1. 
Brenier [H] [TS] (and others around the same time QH] QH] [33J [HJ [33] pQ) showed 
that there is a unique [31] [JJ solution [33] , and characterized it as a convex gradient 



Theorem 1.1 (A version of Brenier's theorem). // /i + -^i dVol and fi~ are Borel 
probability measures on M — R", then there exists a convex function u : R" — > 
R U {+00} whose gradient G — Du : R" — » R™ pushes p + forward to pT . Apart from 
changes on a set of measure zero, G is the only map to arise in this way. Moreover, G 
uniquely minimizes Monge's problem Q for the cost c(x,y) = \x — y\ 2 . 

Remark: In this generality the theorem was established by McCann [33], where the 
assumption pL + <C dVol was also relaxed. A further relaxation by Gangbo-McCann [59] 
is shown to be sharp in Gigli [61"] . 

1.4. Fully- nonlinear degenerate-elliptic Monge- Ampere type PDE. How do 

partial differential equations (more specifically, fully nonlinear degenerate elliptic PDE) 
enter the picture? Let's consider the constraint G#ji + — /i~ ', assuming moreover that 
fjr 1 = f ± dVol ± on R™ or on Riemannian manifolds M ± . Then if cf> G C(M~) is a test 
function, it follows that 



If G was a diffeomorphism, we could combine the Jacobian factor d n y = \ det DG(x) \d n x 
from the change of variables y — G(x) with arbitrariness of (f> o G to conclude f + {x) — 
I det DG(x)\f~ (G(x)) for all x. We will a ctually see that this nonlinear equation holds 
/ + -a.e. as a consequence of Theorem |3. 2 1 



[114] [124] [123] . 
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In the case of Brenier's map G(x) = Du(x), convexity of u implies non-negativity of 
the Jacobian DG(x) = D 2 u(x) > 0. It also guarantees almost everywhere differentia- 
bility of G by Alexandrov's theorem (or by Lebesgue's theorem in one dimension); see 



Theorem 3.2 for the sketch of a proof. Thus u solves the Monge- Ampere equation [TS] 
(6) f+(x)=det(D 2 u(x))r(Du(x)) 

a.e. [94] subject to the condition Du(x) £ M~ for x £ M + . This is known as the 2nd 
boundary value problem in the partial differential equations literature. We shall see that 
linearization of this equation around a convex solution leads to a (degenerate) elliptic 
operator ( 29 1 , whose ellipticity becomes uniform if the solution u is smooth and strongly 



convex, meaning positivity of its Hessian is strict: D u(x) > 0. 



1.5. Applications. The Monge-Kantorovich theory has found a wide variety of appli- 
cations in pure and applied mathematics. On the pure side, these include connections 
to inequalities [92] (131] [94] [32] [90] [52] , geometry (including sectional [85] [73] , Ricci 
[55] [57] [127] [9"9"] and mean [75] curvature), nonlinear partial differential equations P3] 
[T5] [IB] [135] [59"] . and dynamical systems (weak KAM theory [5]; nonlinear diffusions 
[109] ; gradient flows [5]). On the applied side these include applications to vision (im- 
age registration and morphing |64j ) . economics (equilibration of supply with demand 
[42] [27] , structure of cities [23] , maximization of profits [118] [22] [49] or social welfare 
[49]). physics [40] [129] [95] [STj . engineering (optimal shape / material design [12] [15] . 
reflector antenna design [53] [140] [141] . aerodynamic resistance [113] ). atmosphere and 
ocean dynamics (the semigeostrophic theory [35] [37] [33]), biology (irrigation [TO], leaf 
growth [143] ). and statistics |115j . See |138j [139] for further directions, references, and 
discussion. 

1.6. Euclidean isoperimetric inequality. It was observed (independently by Mc- 
Cann [52] [M] and Trudingcr |131j ) that a solution to the second boundary value problem 
for the Monge- Ampere equation (pi yields a simple proof of the isoperimetric inequality 
(with its sharp constant): for M~£ R" 

(7) Vol(M+) = Vol{B x ) H n ~ 1 (dM + ) > W n ~ 1 {dBi). 

The following streamlined argument was perfected later; it combines optimal maps with 
an earlier approach from Gromov's appendix to |101j . 

Proof. Take / + = Xm+ ancl / _ = Xb x to be uniformly distributed. Brenier's theorem 
then gives a volume- preserving map G = Du between M + and B\ : 

1 = det^ n (D 2 u{x)). 

The expression on the right is the geometric mean of the eigenvalues of D 2 u(x), which 
are non-negative by convexity of u, so the arithmetic-geometric mean inequality yields 

(8) 1 < (arithmetic mean of eigenvalues) = —Au 

n 

almost everywhere in M + . (The right hand side is the absolutely continuous part of the 
distributional Laplacian; convexity of u allows it to be replaced by the full distributional 
Laplacian of u without spoiling the inequality.) Integrating inequality ^ on M + yields 

(9) Vol(M+)<- [ Aud n x=- [ Du(x) ■n M +(x)d'H n ~ 1 (x). 

n Jm+ n JdM+ 
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Now, since G = Du € B\ whenever x G A/ + , we have \Du\ < 1, thus V^(Af + ) = 
Voi(-Bi) gives 

(10) VoUB-l)<- I IdW 1 - 1 = -H n - 1 (dM+). 

n JdM+ n 

In the case special M + — B\, Bremer's map coincides with the identity map so equalities 
hold throughout ([S])— ( 10 >, yielding the desired conclusion ([7])! □ 



As the preceding proof shows, one of the important uses of optimal transportation in 
analysis and geometry is to encode non-local 'shape' information into a map which can 
be localized, reducing global geometric inequalities to algebraic inequalities under an 
integral. For subsequent developments in this direction, see works of Ambrosio, Cordero- 
Erausquin, Carrillo, Figalli, Gigli, Lott, Maggi, McCann, Nazaret, Otto, Pratelli, von 
Renesse, Schmuckenschlager, Sturm, Topping and Villani in |110j [3D] [5T] [52"] [5] 

[m pug pm [hh] EZ] us] [sg. 

1.7. Kantorovich's reformulation of Monge's problem. Now let us turn to the 
proof of Brenier's theorem and the ideas it involves. A significant breakthrough was 
made by Kantorovich [67] [68] , who relaxed our optimization problem (the Monge prob- 
lem), by dropping the requirement that all the ore from a given mine goes to a single 
factory. In other words: 

Replace G : M + — > M~ by a measure < 7 on M + x M~ whose marginals are /i + 
and [i~ , respectively, and among such measures choose 7 to minimize the functional 



cost (7)=/ c(x,y)d-y(x ) y). 
Jm+xm- 

Such a joint measure 7 is also known as a "transport plan" (in analogy with "transport 
map"). This is better than Monge's original formulation for at least two reasons: 

1) The functional to be minimized now depends linearly on 7. 

2) The set r(/i + ,/i~) of admissible competitors 7 is a convex subset of a suitable 
Banach space: namely, the dual space to continuous functions (C(M + x M~), || • Hoc) 
(which decay to zero at infinity in case the compactness of M ± is merely local) . 

In this context, well-known results in functional analysis guarantee existence of a 
minimizer 7 under rather general hypotheses on c and /x . Our primary task will be 
to understand when the solution will be unique, and to characterize it. At least one 
minimizer will be an extreme point of the convex set r(/i + ,/i~), but its uniqueness 
remains an issue. Necessary and sufficient conditions will come from the duality theory 
of (infinite dimensional) linear programming [6 70 . 

2. Existence, uniqueness, and characterization of optimal maps 
Let's get back to the Kantorovich problem: 



(11) W c (n + ,n ) := min / c(x,y)dj(x,y) — min cost (7) 

76r(,u+,/i-) Jm+xM- 76 r (i" + .M~) 

The basic geometric object of interest to us will be the support spt 7 := S of a com- 
petitor 7, namely the smallest closed subset S C M + x M~ carrying the full mass of 7. 
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What are some of the competing candidates for the minimizer? 

eg.l) Product measure: /i + <E> fi~ G T(fi + , fi~), for which spt 7 = spt /i + x spt 

eg. 2) Monge measure: if G : M + — > M~ with G#/i + = y~ then id x G : M + — > 
M+ x M~ and 7 = (id x G)#^+ e T(fi + ,[i~) has cost (7) = cost (G). 

The second example shows in what sense Kantorovich's formulation is a relaxation of 
Monge's problem, and why Q must be at least as big as ( 11 ). In this example spt 7 will 



be the (closure of the) graph of G : M + — > M~ , which suggests how Monge's map G 
might in principle be reconstructed from a minimizing Kantorovich measure 7. Before 
attempting this, let us recall a notion which characterizes optimality in the Kantorovich 
problem. 

Definition 2.1 (c-cyclically monotone sets). S C M + x M~ is c-cyclically monotone 
if and only if all k £ N and (xi, y\), (xfc, yu) £ S satisfy 

k k 

(12) C ( X i'Vi) - ^2 C ( X i' V<r(i)) 

i=l i=l 

for each permutation a of k letters. 

The following result was deduced by Smith and Knott [125] from a theorem of 
Riischendorf |120j . A more direct proof was given by Gangbo and McCann [59]; its 
converse is true as well. 

Theorem 2.2 (Smith and Knott '92). If c e C{M+ x M~), then optimality of 7 e 
r(/i + ,/i _ ) implies spt 7 is a c-cyclically monotone set. 

The idea of the proof in [SH] is that if spt 7 is not cyclically monotone, then setting 
Oi = (xi,yi) and Zi — (xi,y a u\) we could with some care define a perturbation 

7 £ = 7 + e(near the z's) — e(near the o's) 

in r(/x + , /i~) of 7 for which cost (7 e ) < cost (70), thus precluding the optimality of 7. 

k 

I f — »t\ * f Vion / I'M KopAm DC 



e.g. If c(x, y) = —x ■ y or c(x, y) = | |x — y\ 2 then ( 12 ) becomes (yi, xi — X4-1) > 



i=l 

with the convention xo := Xfe. This is simply called cyclical monotonicity, and can be 
viewed as a discretization of 

fy(x).dx>0, 

a necessary and sufficient condition for the vector field y(x) to be conservative, meaning 
y = Du(x). This heuristic underlies a theorem of Rockafellar [119] : 

Theorem 2.3 (Rockafellar '66). The set S C R™ x R n is cyclically monotone if and 
only if there exists a convex function u : R" — > R U {00} such that S C du where 

(13) du := {(x, y) e R™ x R" | u(z) > u(x) + (z - x,y) + o(\z - x|) Vz e R"}. 



The subdifferential du defined by ( 13 1 consists of the set of (point, slope) pairs for 
which y is the slope of a hyperplane supporting the graph of u at (x,u(x)). 

Remark 2.4 (Special case (monotonicity)). Note that when c(x,y) = —x ■ y and k = 2, 



(121 implies for all (xi, yi), (X2, J/2) € <S that 

(14) (Ax, Ay) := {x 2 — xi,y 2 — yi) > 0. 

This condition implies that j/2 is constrained to lie in a halfspace with y\ on its boundary 
and Ax as its inward normal. Should y = Du(x) already be known to be conservative, 



the monotonicity inequalities ( 14 1 alone become equivalent to convexity of u. 
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2.1. Linear programming duality. An even more useful perspective on these linear 
programming problems is given by the the duality theorem discovered by Kantorovich 
[57] and Koopmans [75J — for which they later shared the Nobel Memorial Prize in 
economics. It states that our minimization problem is equivalent to a maximization 
problem 



(15) min / cdj = sup — / u(x) d/j, (x) — / v{y)dpL (y). 

7er(M+ Jm+xm- (-u-v)eLi Pc Jm+ Jm- 

Here 

Lip c = {(u + ,u~) with G L 1 (M ± ,dfi ± ) \ c(x,y) > u + (x)+u~(y) V(x,y) G M + xM~}. 
One of the two inequalities (>) in ([lo]) follows at once from the definition of — (u, v) G 



Lip c by integrating 

c(x,y) > -u(x) - v(y) 
against 7 G /i~). The magic of duality is that equality holds in ( 15 ) 



2.2. Game theory. Some intuition for why this magic works can be gleaned from the 
the theory of (two-player, zero-sum) games. In that context, Player 1 chooses strategy 
x G X, Player 2 chooses strategy y EY, and the outcome is that Player 1 pays P(x,y) 
to Player 2. The payoff function P G C(X x Y) is predetermined and known in advance 
to both players; PI wants to minimize the resulting payment and P2 wants to maximize 
it. 

Now, what if one the players declares his or her strategy (x or y) to the other player 
in advance? If PI declares first, the outcome is better for P2, who has a chance to 
optimize his response to the announced strategy x, and conversely. This implies that 

(16) inf sup P{x,y) > sup inf P(x,y); 

x^X y£Y y£Y x£X 

(Player 1 declares first vs player 2 declares first.) Von Neumann |107] identified struc- 



tural conditions on the payoff function to have a saddle point (17), in which case equality 



holds in (151; see also Kakutani's reference to [108] in [55) . 



Theorem 2.5 (convex/concave min-max). If X C R m and Y C R™ are compact and 



convex, then equality holds in (16) provided for each (xo,yo) G X x Y both functions 
x G X 1 — > P(x,yo) and y G Y 1 — > —P(xo,y) are convex. (In fact, convexity of all 
sublevel sets of both functions is enough.) 



Proof. Let 



x b(y) G arg min P(x, y) yb (x) G argmaxP(a;,y) 



denote the best responses of PI and P2 to each other's strategies y and a;. Note Xb and 
yb are continuous if the convexity and concavity assumed of the payoff function are both 
strict. In that case, Brouwer's theorem asserts the the function yb o Xb : Y — > Y has a 
fixed point y Q . Setting xq — Xb{yo), since j/q = Ub(xo) we have found a saddle point 

(17) inf maxP(i, y) < maxP(a;o, y) — P(x 0l j/q) = vnin P(x, yo) < sup min P{x, y) 

x£X y£Y y£Y x£X y^y xdX 



of the payoff function, which proves equality holds in ( 16 ). If the convexity and concavity 
of the payoff function are not strict, apply the theorem just proved to the perturbed 
payoff P e (x,y) — P{x,y) + e(\x\ 2 — \y\ 2 ) and take the limit e — > 0. □ 

e.g. Expected payoff |107j : von Neumann's original example of a function to which the 
theorem and its conclusion applies is the expected payoff P{x,y) = T^j=iPij x iVj 
of mixed or randomized strategies x and y for a game in which PI and P2 each have 
only finitely many pure strategies, and the payoff corresponding to strategy 1 < i < m 
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and 1 < j < n is pij. In this case X = {x € [0, l] m | J2 x i = 1} an d V = {y G 
[0,1]" | ^ j/j = 1} are standard simplices of the appropriate dimension, whose vertices 
correspond to the pure strategies. 

2.3. Relevance to optimal transport: Kantorovich-Koopmans duality. Infinite 
dimensional versions of von-Neumann's theorem can also be formulated where X and 
Y lie in Banach spaces; they are proved using Schauder's fixed point theorem instead of 
Brouwcr's. A payoff function germane to optimal transportation is defined on the strat- 
egy spaces X = {0 < 7 on M+ x M~} and Y = {(u, v) G L 1 (M + ,u+) © L^M - ,/* - )} 

by 



P(7>(u,v))= / (c(x,y)+u(x) + v(y))d'j(x,y)- / udu+ - / vd/j, . 

Jm+xm- Jm+ Jm- 

Note the bilincarity of P on X x Y. Since 

inf P( (u )) — i _ °° unless (—u,—v) e Lip c , 

-tex * ' 1 — f udu + — J vdu~ otherwise, 

the Kantorovich-Koopmans dual problem is recovered from the version of the game in 
which P2 is compelled to declare his strategy first: 



(18) sup inf P(7, (u,v)) = sup / (— u)dn + + j (-v)da 

(u,v)eY^ eX (— -u) eiipc • 

On the other hand, rewriting 

■ P (7>(«)W))= / cd-y+ I u(d-y-du + )+ / v(y)(d-y(x, y) - dfj,~(y)) 

JM+xM- 

we see 



/ +00 unless 7 e T(n 

(19) sup P(7, («,«)) = <^ r cd if^erfu+u- 

(tt,o)£Y I JM+xM- ca < u 7 t 1 )M 



-00 unless 7 G r(/i + , /i ) 

m+ xM- cd "f if T£ r (M + . M~ ) ■ 

Thus the primal transportation problem of Kantorovich and Koopmans 

inf sup P(j 7 (u,v))— inf / cd-y 
i eX (u.v)eY 7er0i+,/i-) Jm+xM- 

corresponds to the version of the game in which PI declares his strategy first. The equal- 
ity between (18) and (19) asserted by an appropriate generalization of von Neumann's 
theorem implies the duality (15): 

min / cehf = sup / (— u)dfi + + / (—v)du~ . 

7er(/i+,/j-) Jm+xm- (-«,-• v)eUp e Jm+ Jm- 

2.4. Characterizing optimality by duality. The following theorem can be deduced 
as an immediate corollary of this duality. We may think of the potentials u and v 
as being Lagrange multipliers enforcing the constraints on the marginals of 7; in the 
economics literature they are interpreted as shadow prices which reflect the geographic 
variation in scarcity or abundance of supply and demand. The geography is encoded in 
the choice of cost. 

Theorem 2.6 (Necessary and sufficient conditions for optimality). The existence of 
— (u,v) G Lip c such that 7 vanishes outside the zero set of the non-negative function 
k(x,y) = c{x,y) + u(x) + v(y) > on M + x M~ is necessary and sufficient for the 
optimality 0/7 G u~) with respect to c € C{M + x M~). 
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Corollary 2.7 (First and second order conditions on potentials). Optimality of 7 and 
(u,v) implies Dk — and D 2 k > at any point (x,y) G spt 7 where these derivatives 
exist. In particular, D x [c(x,y) + u(x) + v(y)] = and D 2 [c(x,y) + u(x) + v(y)} > 
holds 7-a.e., and likewise for y- derivatives. 

e.g. Consider the special case of the bilinear cost: c(x, y) — —x ■ y. Here the first and 
second order conditions of the corollary become 

y = Du(x) and D 2 u(x) > 0, 

suggesting y is the graph of the gradient of a convex function. In this case, convexity 
of u guarantees D 2 u is defined a.e. with respect to Lebesgue measure, by Alexandrov's 
theorem. 

2.5. Existence of optimal maps and uniqueness of optimal measures. More 



generally, we claim u inherits Lipschitz and scmiconvexity bounds (23 1— ( 24 ) from c(x, y), 
which guarantee the existence of x-derivatives in the preceding corollary — at least 
Lebesgue almost everywhere. This motivates the following theorem of Gangbo [57] and 
Levin [5U]; variations appeared independently in Caffarelli jTHj, Gangbo and McCann 
[55] [ST)] , and Ruschendorf |121j |122j at around the same time, and subsequently in [SJJ] . 

Definition 2.8 (Twist conditions). A function c £ C(M + x M~) differ entiable with 
respect to x G M + is said to be twisted if 

(20) (Al) + Vi £ M + , the map y G M~ 1 — ► D x c(x , y) G T*M + is one-to-one. 

For (x,p) G T* M + denote the unique y G M~ solving D x c(x,y) +p = by y = Y(x,p) 
when it exists. When the same condition holds for the cost c(y,x) :— c(x,y), we denote 
it by (Al) . When both c and c satisfy (Al) , we say the cost is bi-twisted, and denote 
this by (Al). 

Theorem 2.9 (Existence of Monge solutions; uniqueness of Kantorovich solutions). 
Fix Polish probability spaces (Af ± ,/x ± ) and assume M + is a n-dimensional manifold 
and dfi + <C d n x is absolutely continuous (in coordinates). Let c £ C(A1 + x M~) differ- 
entiable with respect to x G M + satisfy the twist condition ( |20[ ) and assume D x c{x, y) is 
bounded locally in x G M + uniformly in y G M~ . Then, there exists a locally Lipschitz 
(moreover, c-convex, as in Definition 2.1(fy function u : M + — > R such that 



a) G{x) := Y(x, Du(x)) pushes /i + forward to \x 

b) this map is unique, and uniquely solves Monge's minimization problem Q; 



c) Kantorovich 's minimization (11) has a unique solution^ 



d) 1 — {id x G) # /x + . 

Definition 2.10 (c-convex). A function u : M + — > RU{+oo} (not identically infinite) 
is c-convex if and only if u = (u c ) c , where 

(21) u c (y) = sup —c(x,y) — u(x) and v c (x) = sup — c(x, y) — v(y). 

xeM+ y£M~ 

Remark 2.11 (Legendre-Fenchel transform and convex dual functions). When c(x,y) — 
— {x, y), then u c {y) is manifestly convex: it is the Legendre-Fenchel transform or convex 
dual function of u{x). In this case, (u c ) c is well-known to yield the lower semicontinuous 
convex hull of the graph of u, so that u = (u c ) c holds if and only if u is already lower 
semicontinuous and convex. More generally, we interpret the condition u = u cc as being 
the correct adaptation of the notion of convexity to the geometry of the cost function c. 
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Sketch of proof of Theorem \2.9[ T he key idea of the proof is to establish existence of a 



maximizer — (it, v) £ Lip c of (15) with the additional property that (u,v) = (v c ,u c ). 



Differentiability of u — u cc on a set dom Du of full d/i + <C d n x measure then follows from 



Rademacher's theorem and Lemma 3.1 The map G(x) := Y(x,Du(x)) is well-defined 



on dom Dm by the twist condition (assuming the supremum (21 1 defining (u c ) c (x) is 



attained). Corollary 2.7 shows any minimizer 7 vanishes outside the graph of this map, 
and it then follows easily that 7 = (idx G)#/i + and hence 7 is uniquely determined by u 
[2]. Conversely any other c-convex u for which G{x) = Y(x, Du(x)) pushes /i + forward 
to fi~ can be shown to maximize the dual problem by checking that 7 = (id x G)#/i + 
vanishes outside the support of G. Thus 7 = 7 and G = G holds /i + -a.e. 

To extract the desired — (u,v) £ Lip c from a maximizing sequence — (v,k,Vk) requires 
some compactness. (This would come from the convexity of Uf. and Vk in case c(x, y) = 
—x ■ y via the Blaschke selection theorem.) Observe — (u, v) € Lip c implies 

u(x) > sup — c(x, y) — v(y) =: v c (x). 

y£M- 

Morcover, — (v c ,v) £ Lip c and — (v c ) > —u can only increase the value of the objective 
functional relative to — [u, v). Thus — (v c , v) is a better candidate for a maximizer than 
— (u,v). Repeating the process shows — (v c ,v cc ) and — (v ccc , v cc ) £ Lip c are better 
still, since (v c ) c < v and (v cc ) c < v c by the same logic. On the other hand, starting 



from v cc < v, the negative coefficient in definition (21 1 implies the opposite inequality 
(yccy -> v c _ 'pjjyg v ccc _ y c g ene rally (This is precisely analogous to the fact that 
the second Legendre transform u** does not change a function u = v* which is already 



convex and lower semicontinuous; see Remark 2.11 ) 

Replacing a maximizing sequence — (Ufe,i>fc) with — (v^v^ ) therefore yields a new 
maximizing sequence at least as good which moreover consists of c-convex functions. 
Lemma |3.1| shows this new family is locally equi-Lipschitz, hence we only need local 
boundedness for the Arzela-Ascoli theorem to yield a limiting maximizer — (u, v), which 
will in fact be c-convex, though we can also rep lace it by — (v c ,v cc ) just to be sure. 
Local boundedness also follows from Lemma pTl] after fixing xq £ spt ^+ C M + and 



replacing — (uk,Vk) by — (uk — ^k,Vk + \k) with = Uk(xo). This replacement does 



not change the value of the objective functional (18), yet ensures that u(xq) =0. □ 



3. Methods for obtaining regularity of optimal mappings 

Given mines and factories (M ± ,fi ± ) and a cost function c £ C(M + x M~), in the 
preceding section we found conditions which guarantee the existence and uniqueness of 
a map G(x) = Y(x, Du(x)) such that G#/i + = [iT with u = u cc , ie. c-convex. Under the 
same conditions, the map G is the unique minimizer of Monge's problem Q. The space 
M + was assumed to be an 71-dimensional manifold, and the following twist hypothesis 



(Al) , equivalent to (20 1, was crucial to specifying Y(x, • 



(22) V yi 7^ J/2 € M assume x £ M + \ — > c(x, y\) — c(x, j/2) has no critical points. 



Notice, however, that (22) cannot be satisfied by any cost function which is differen- 
tiable throughout a compact manifold M + . In case M + = S n , Monge solutions do not 
generally exist [601 , but criteria are given in 27J 2J which guarantee uniqueness of the 
Kantorovich minimizer. On the other hand, it is an interesting open problem to find a 
criterion on c £ C 1 (M + x M~) which guarantees uniqueness of Kantorovich solutions 
for all ^ £ L 1 (M ± ) in more complicated topologies, such as the torus M ± = T™ for 
example. Here differentiability of the cost function is crucial; for costs such as Riemann- 
ian distance squared, the desired uniqueness is known [53] [HZ], but the cost fails to be 
cliff crcntiable at the cut locus. 
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3.1. Rectifiability: differentiability almost everywhere. The current section is 
devoted to reviewing methods for exploring the smoothness properties of the optimal 
map G found above, or equivalently of its c-convex potential u. The following lemma 
shows that all c-convex functions inherit Lipschitz and semiconvexity properties directly 



from the cost function c; it has already been exploited to prove Theorem 2.9 



Lemma 3.1 (Inherent regularity of c-convex functions). If u = u cc and c(-,y) € 



Cf oc (M + ) for each y € M , then k — 1 implies (23) and k — 2 implies (24 1: 



(23) \Du{x)\< sup \D x c(x,y)\ (local Lipschitz regularity); 

y£M- 

(24) D 2 u(x) > inf —D xx c(x,y) (semiconvexity). 

y£M- 

Similarly, c-cyclically monotone sets S C M + x M~ turn out to be contained 
in Lipschitz submanifolds of dimension n = dimM when the cost function is non- 



degenerate (25). The following recent theorem of McCann-Pass- Warren [100] combines 
with Rademacher's theorem — which asserts the differentiability Lebesgue a.e. of Lips- 
chitz functions — to give a simple tool for establishing that f + (x) = | det(DG(x))\f~ (G(x)) 
holds /+-a.e. 



Theorem 3.2 (Rectifiability of optimal transport 100). Assume M are n- dimensional 
manifolds, at least in a neighbourhood U of (#0,2/0) € M + x M~ , where c € C 2 (U) and 



(25) 



(A2) detD 2 x iyj c{xo,y ) ^ 0. 



If S a M + x M is c-cyclically monotone, then SnV lies in an n-dimensional Lipschitz 
submanifold, for some neighbourhood V <ZU of (xq, yo). 



In view of Theorem 2.2 this conclusion applies either to the graph S = Graph(G) 
of any optimal map Q or the support S = spt (7) of any optimal measure ( |11[ ) in the 
transportation problem. 



M~ 




M+ =R 

Figure 1: Optimizers have locally monotone support S in the plane. 
Motivation: on the line M ± = R, without further assumptions on c or /i*, a transport 
map may not exist nor be monotone, yet the theorem above says that even so all pieces 
of spt (7) lie along along Lipschitz arcs in the plane. These curves will actually be locally 
monotone — non-decreasing or non- increasing depending on the sign of D xy c(xo,yo)] 
see Figure 1. 

Idea of proof . Introduce the notation b = — c. In case c(x,y) = —x ■ y on M ± = R", 
monotonicity asserts for all (xo,yo), (xx,yi) G S that Ax = x\ — xq and Ay = yi — yo 
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Aw Az + Aw 



V2 



satisfy 

< (Ax, Ay) = 
where 

fna\ i \ ,x + y x-y 

(26) (z,w):=( — ,—). 

This implies that |Au>| 2 < |A,z| 2 meaning w = w(z) has Lipschitz constant 1 as a graph 
over z € R". Equivalently, S has Lipschitz constant 1 as a graph over the diagonal in 
M + x M~ . This special case was established by Alberti and Ambrosio !$., using an 
argument of Minty [1021 . 



For more general costs b — — c and any e > 0, the non-degeneracy (25) implies 
the existence of new coordinates y — y(y) on M~ in a neighbourhood of yo such that 
b(x,y(y)) = b(x,y) satisfies \D 2 yb(x,y) — I\ < e in a neighborhood V of (x ,y ) which 
is convex in coordinates. 

Now, (Ax, Ay) > — e|Ax|Ay| follows from 

(27) < b(x ,y ) + b(x 1 ,y 1 ) - b(x ,y{) - b(xi,y ) = D 2 x ~b(x* , y*)(xi - x )(yi - y ) 



and the change of variables analogous to ( 26 ) yields 

|Az| 2 - |Aw| 2 > -e|Atu - Az||Au; + Az\ > -e (\Aw\ 2 + \Az\ 2 ) . 

Thus (1 + e)|Az| 2 > (1 — e)|Au>| 2 , which shows w = w(z) is again a Lipschitz function 
of z € R™ in the chosen coordinates. □ 

3.2. From regularity a.e. to regularity everywhere. The regularity results dis- 
cussed so far — Lipschitz continuity of the potential u, and of Graph(G) C M + x M~ 
rather than of the map G(x) — Y(x, Du(x)) itself — required no hypotheses on the prob- 
ability measures \x~ = G#fi + . To address the continuity differentiability, and higher 
regularity everywhere for the map G : M + — > M~ is a much more delicate issue which 
certainly requires further hypotheses on the data and c. For example, if spt yT is 
connected but spt y + is not, then G cannot be continuous. The same reasoning makes it 
clear that ellipticity of the Monge- Ampere equation ^ cannot be non-degenerate for all 
convex solutions; regularity must propagate from boundary conditions since the purely 
local effect of the equation is insufficent to conclude u £ C\ oc . It is often easier to work 
with the scalar potential u rather than the mapping G; we shall see this reduces the 
problem to a question in the theory of second-order, fully-nonlinear, degenerate-elliptic 



partial differential equations (33 1 generalizing the Monge- Ampere equation. However, 
this question was answered first in the special case c(x, y) — —x ■ y corresponding to 
the case ^ by Delanoe in the plane n = 2 [38], and by Caffarelli and Urbas in higher 
dimensions M± = R" [TS] [T7j [T5] [155] . 

Remark 3.3. Note for c e C k+1 (M+ x M~) that u £ C k+1 implies G G C k by the 
following remark. Whereas the twist condition (Al) + asserts that the definition Y(x,p) 
by D x c(x, Y(x,p)) + p = is unambiguous, non-degeneracy (A2) allows the implicit 
function theorem to be applied to conclude C k smoothness of Y(x,p) (where defined). 

3.3. Regularity methods for the Monge-Ampere equation; renormalization. 

There are several methods for obtaining regularity results for convex solutions of the 
Monge-Ampere equation. The first to be discussed here is the continuity method, used 
for example by Delanoe '91 and Urbas '97. This approach requires relatively strong 
assumptions on the smoothness of the measures dfi^ = f ± dVol and on the convexity 
and smoothness of their domains M ± C R". When it applies, it yields global regularity 
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of the resulting potential up to the boundary of dAI + from the same fixed point argu- 
ment which shows a solution exists. The second method is the renormalization method 
pioneered by Caffarelli '92-'96, which starts from the unique (weak) solution to the 2nd 
boundary value problem and uses affine invariance of the equation to blow-up the so- 
lution near a putative singularity and derive a contradiction in the limit. This method 
is quite flexible: it has the advantage of yielding certain conclusions under weaker as- 
sumptions on the data, and has therefore proven useful for addressing such phenomena 
as the free boundary which arises in partial transport problems, where the densities /* 
need not be continuous and are constrained but not specified a priori |20| . Using this 
method, Caffarelli '92 was able to prove the following regularity result on the interior 
M+ t of M+cR": 

Theorem 3.4 (Local regularity [18]). Fix c — —x ■ y and let M~ C R" be convex. 
Assume /x* = f ± dx with log/* G L°°(M*). //log/* G W%™(M± t ), there exists a G 
(0, 1) (depending only on n,k and the bounds on logf ± ) such that u G C l( ^ 1 ' a (M^ nt ) ) 
where u is the convex function with Du(M + ) C M~ such that Du#n + = /i _ . 



Remark 3.5 (Degenerate ellipticity) . As shown also in |18j . when one drops the convexity 
assumption on M~ the gradient map may be discontinuous at interior points. This goes 
in hand with the claim made in the previous subsection that regularity (even in the 
interior) must propagate from the boundary. 

Remark 3.6 (Local versus global regularity). This is a local regularity result. Global 
regularity (up to dM + ) requires both domains dM ± to be strongly convex and smooth 
(Caffarelli '96) [TJ]]. Here strong convexity means the principal curvatures (or second 
fundamental form) of the domain boundaries should be positive-definite. 

Remark 3.7 (Higher regularity via uniformly elliptic linearization). The cases of primary 
relevance are log/ merely bounded and measurable (k — 0), and log/* also locally 
Lipschitz (fc = 1). Once u G Cfj" has been deduced from these assumptions, higher 
regularity in the interior of M ± follows from uniform ellipticity of the Monge- Ampere 
equation 

(28) = log(det(L> 2 u(x) + tD 2 w{x))) + log(f-(Du{x) + tDw{x))) - \og(f + {x)). 
For example, linearizing this equation at r = yields the equation 

(29) = Tr(D 2 u{x)~ 1 D 2 w(x)) + D log f-\ Du{x) ■ Dw(x), 

which must be satisfied by spatial derivatives w — D x iU of u. Convexity combines 
with u G C 2 £ and the equation ^ itself to bound HA™- 1 /"//*!!^ < D' 2 u{x) < A 
on compact subsets of Mi t . The derivatives of u thus satisfy a uniformly elliptic 



linear equation (29) with Holder continuous coefficients, so Schauder estimates [62] and 



bootstrapping yield as much as regularity as can be expected when k > 2. 

3.4. The continuity method (schematic), (cf. Delanoe '91, Urbas '97): To apply 
the continuity method, we assume M C R" are smooth and strongly convex, and 

log/* g c 2 ^(M ± )ni°°. 

Choose a dilation by e > sufficiently small and translation Mq = Gq(M + ) of M + 
by xq G R n such that M CC M~. Let /i := (Go)#^+ be the push-forward of \x + 
through the corresponding dilation and translation Gq(x) = ex — xq. Notice that Go is 
the gradient of the smooth convex function uq(x) = e\x\ 2 /2 — xq -x, and as such gives the 
optimal map between /j + and /Iq. The idea behind the continuity method is to construct 
a family of target measures d[it = ftdVol interpolating between d^to and d/ii := d[iT , 
and to study the set T of t E [0,1] for which the optimal transportation problem of 
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Bremer admits a solution with a convex potential Ut E C 2 ' a (M + ). The interpolating 
measures must be constructed so that the C 2,a (M ± ) n L°° norms of log/ t , and the 
strong convexity and smoothness of M t = spt ft, can be quantified independently of 
t E [0, 1]. We then hope to show T C [0, 1] is both open and closed. If so, it must 
exhaust the entire interval (since € T), therefore 1 € T as desired. 

Closed: To show closedness of this set requires an a priori estimate of the form 
IKIIc 2 .°(M+) < C(||log/ ± ||c=>«(M±).||9M ± || c .2, a!8troI1 g convexity) for any smooth solu- 
tion u t E C 4 (M + ) of the 2nd boundary value problem 

(30) detD 2 u t {x) = /^ X ) with Du t {M+) C M t . 

Jt\Du t (x)) 

Such estimates are delicate, but can be obtained by differentiating the equation twice, 
and constructing barriers. Once obtained, they imply that if E T and t^ — > t^ then 
too E T also; the corresponding solutions Ut k belong to C 4 (M + ) as in Remark 



3.7 



Open: The fact that T is open is shown using an implicit function theorem in Banach 
spaces. This requires knowing that the linearized operator is invertible (ie. uniformly 
elliptic), and can be solved for the relevant boundary conditions. 

For u E C 2,a (M + ) we have already argued the uniform ellipticity of the linearization 



(29 1. To linearize the boundary conditions Du t (M + ) C M t , introduce a sufficiently 
smooth and strongly convex function h : R" — > R whose level sets M t = {y E R" | 
h(y) < t} give the domains M t := spt / t , and rewrite the non-linear boundary condition 
in the form h(Dut{x)) < t with equality on dM + . Linearizing this in u yields the 
boundary condition of the linear equation for w: 

(31) Dh(Du t (x)) ■ Dw(x) = ondM + . 



For the linear problem ([29| to be well-posed, we need a uniformly non-tangential pre- 
scribed gradient for w on dM + . Since Dh parallels the normal nj/, to M t , this amounts 
to the uniform obliqueness estimate 

fiM t (Dut(x)) • h M +(x) > S > (obliqueness) 

provided by Urbas [135], with S depending only on coarse bounds for the data. This 



concludes the sketch that T C [0, 1] is open: well-posedness of the linear problem (29) 



(31 1 when t = to E T implies the existence of solutions Ut E C 2 ' a (M + ) to the nonlinear 



problem (30) for any t close enough to io- 

Both approaches (renormalization and continuity method) have been extended in 
recent years to more general costs, and this will be the topic of the next few lectures. 



4. Regularity and counterexamples for general costs 

4.1. Examples. The development of a regularity theory for general cost functions sat- 
isfying appropriate hypotheses on compact domains C R™ began with the work of 
Ma, Trudinger and Wang [89] . Prior to that there were regularity results only for a few 
special costs, such as: 

Example 4.1 (Bilinear cost). c(x,y) = —x ■ y or equivalently c(x,y) = \x — y\ 2 /2 [35] 
[T5] Qi] [T35], and its restriction to M ± = <9Bi(0) in R" (Gangbo and McCann [r5U] ).- 

Example 4.2 (Logarithmic cost; conformal geometry and reflector antenna design). 
c(x, y) = — log |a; — y\ appearing in conformal geometry ( cf. Viaclovsky's review [137] ). 
and its restriction to the Euclidean unit sphere, which is relevant to reflector antenna 
design (Glimm and Oliker [63] . X.-J. Wang [140] [14fj ) and helped to inspire Wang's 
subsequent collaborations [89j |133j [134] |83[ with Trudinger, Ma, and Liu. 
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In the wake of Ma, Trudinger and Wang's |89] results, many new examples have 
emerged of cost functions which satisfy [Hi] [31] [53] [3S] (5^ [H] [73] [71] [75] [7S] [77] 
[ST] [55] [EH] — or which violate [5J] [55J — their sufficient conditions (A0)-(A4) and 
(A3) s for regularity from j ]4.4| below — not to mention the subsequent variants (A3) and 
(B3) introduced by Trudinger and Wang |133) and Kim and McCann [73] respectively, 
on the crucial condition (A3) s . Among the most interesting of these are the geometrical 
examples and counterexamples of Loeper: 

Example 4.3 (Sphere). c(x,y) — hdg„(x,y) on the round sphere satisfies (A3) s |84| 
(and (B3) [74] j; 

Example 4.4 (Saddle). c(x, y) — \d\ I {x 1 y) on hyperbolic space M = H™ violates (A3) s 
(and (A3) [85] ) — as does the Riemannian distance squared cost on any Riemannian 
manifold M — M which has ( at least one ) negative sectional curvature at some point 
x G M. 

4.2. Counterexamples to the continuity of optimal maps. For any cost function 
which violates (A3), Loeper went further to show there are probability measures dp = 
f d n x with smooth positive densities bounded above and below — so that log/* € 
C°°(M ± ) — for which the unique optimal map G : M + — > M~ is discontinuous [55] , 
Let's see why this is so for the quadratic cost given on either the hyperbolic plane or a 



saddle surface as in Example 4.4 



Consider transportation from the uniform measure fj, + on a sufficiently small ball to 
a target measure consisting of three point masses fjT = | 53i<3 <^/< near the center of 
the ball, choosing y 2 to be the midpoint of y\ and t/3. In this case Theorem 1 2 . 9| provides 
constants V\ , . . . , V3 and a c-convex function 

(32) u(x) = max{?/i(a;) | i = 1,2, 3} where Ui(x) — —c(x,yi) — Vi, 

such that the optimal map G#^ + = fi" satisfies G^ 1 (yi) = {x € M + \ u(x) = Ui(x)}. 
We interpret — Vi to be the value of the good at the potential destination j/j G spt 
the producer at x £ M + will ship his good to whichever target point yi provides the 



greatest value after transportation costs are deducted (32 1; here the values v 1 , . . . , v 3 
are adjusted to balance supply with demand, so that each of the three regions G~ 1 (yi) 
contains 1/3 of the mass of fi + . For the Euclidean distance-squared cost these three 
regions are easily seen to be convex sets, while for the spherical distance-squared they 
remain connected. For the hyperbolic distance-squared, however, the 'middle' region 
G~ 1 {y 2 ) consists of two disconnected components, near opposite sides of the ball spt p, + 
(see figure below, or for instance Figure 1 of [73]). This disconnectedness is the hallmark 
of costs for which (A3) fails, and allowed Loeper to construct counterexamples to the 
continuity of optimal mappings as follows. 

In the preceding discussion, was not given by a smooth positive density; still it can 
be approximated by a sequence of measures n~ := pT * n e which are. Now consider the 
reverse problem of transporting ^7 to Call the optimal map for this new problem 
x = G~(y). For S > 0, taking e > sufficiently small ensures for each 1 < i < 3 that 
nearly 1/3 of the mass of concentrates near yi and is mapped into a (^-neighbourhood 
of G^ 1 (yi). Intuitively, for S sufficiently small, this forces a discontinuity of G~ which 
tears the region near y 2 into at least two disconnected components: nearly half of the 
mass near this point must map to each disconnected component of G^ 1 (y 2 )', see Figure 
2. This construction shows why the distance-squared cost on a hyperbolic or saddle 
surface cannot generally produce smooth optimal transport maps. 
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e > 



Figure 2: A tear occurs when spreading a triply peaked density uniformly over the saddle. 

There is another of obstruction to the continuity of G, namely the convexity (at least 
when M^- — R n ) of the support of . This was shown by Caffarelli [18] with the 
following elementary example: consider u : M 2 —> R given by 

u(x) = \xi\ + ^M 2 , x=(xi,x 2 ). 

If we consider the cost c{x, y) = —x ■ y, then y = Du{x) gives the optimal transport 
map between the unit disc (with Lebesgue measure) into two shifted half discs (Figure 
3); in particular, the transport map is discontinuous across {x\ = 0}. 









Figure 3. Disconnected targets also produce tears, as do non-convex targets (spt \i ). 

We turn now to conditions which rule out these type of examples, and lead to positive 
regularity results. 

4.3. Monge- Ampere type equations. For the quadratic cost, finding a smooth op- 
timal map was equivalent to solving the 2nd boundary value problem for the Monge- 
Ampere equation (JsJ) . Let us now derive the analogous equation for a more general cost, 
keeping in mind that whatever PDE we end up with cannot generally be better than 
degenerate-elliptic, since vanishing of d^T = f~ dVol can lead to non-smooth solutions 
in the interior of the support of <i/i + = f + dVol. 

Let us see what specific PDE emerges from the local expr ession | det DG{x)\ = 

that D 2 xx c{x, G{x)) + 



Recall from Corollary 



2.7 



f+(x)/f-(G(x)) for G #M + - fi~ 

D 2 u(x) > and D x c(x, G(x)) + Du(x) — 0. Differentiating the latter expression gives 
a relation 

D 2 xx c{x, G{x)) + D 2 xy c{x, G{x))DG{x) + D 2 u{x) = 
which can be solved for DG(x) to yield 



(33) det (D 2 u(x) + D 2 xx c{x, Y(x, Du{x)))) = 



det (D 2 xy c(x,y)) 



f-(y) 



y—Y(x,Du(x)) 
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Here we have assumed (Al) -(A2), the form G(x) = Y(x,Du(x)) of the optimal map 
is from Theorem 2.9 and the boundary condition is Y(x, Du(x)) C M _ for all x £ M + . 
We have arrived as before at a fully-nonlinear second-order equation, whose linearization 
around any c-convex solution u — u cc is degenerate-elliptic. 

4.4. Ma-Trudinger-Wang conditions for regularity. Sufficient conditions for the 
c-optimal map G : M + — > M~ to be smooth between a pair of smooth bounded 
probability densities satisfying log/* £ C°°(M ± ) on compact domains M C R" were 
found by Ma-Trudinger-Wang and Trudinger-Wang 89 [133J. The crucial condition 



on the cost c(x,y) distinguishing Examples 4.1-4.3 from Example 4.4 above involves a 



quantity they identified, which other authors have variously dubbed the Ma-Trudinger- 
Wang tensor [139] . c-sectional curvature [85] . or cross-curvature [73] ; c.f. ^6] below. To 
define it, let us adopt their convention that subscripts such as c% j — d 2 c/ 'dx i dy 3 and 
Cij kl = uc/dx i dx^dy dif indicate iterated derivatives in coordinates, with commas 
separating derivatives with respect to x £ M + from those with respect to variables 
y £ M~ . Let c hl {x 1 y) denote the inverse matrix to Cij(x,y). 

Definition 4.5 (Cross-curvature). Given tangent vectors p £ T X(j M + and q £ T Vo M~ , 
define cross(p,q) := (— c« jy + r c r ' m c m ^i) P^P 3 Q q ■ Here and subsequently, the Ein- 
stein summation convention is in effect. 

The conditions assumed by Ma, Trudinger and Wang were the following 89J; our 
designations (A3) s and (A3) correspond to their (A3) and (A3w) from [133J : 

(AO)ce C 4 (M+xM-), and for all (x , y ) in the compact set M+ x M~ C R"xR"; 

(Al) y £ M~ i — > D x c(x<y,y) and x £ M + i — > D y c(x,yo) are injective; 

(A2) detD 2 xZ y] c(x ,y Q ) = det(c l . J ) ^ 0; 



(A3) cross(p, q) > for all (p, q) £ T( XOiVo )M + x M such that p l Ci^ = 0; 

(A4) M~ := D x c(x ,M-) C R n and M+ := D y c(M + ,y ) C T* Q M~ are convex. 

Among the variants on (A3) subsequently proposed [75] [SB] [S3], let us recall the 
non-negative cross- curvature condition |73j : 

(B3) cross(p,g) > for all (p,q) £ T [xq , Vo) M + x M~ . 



The first two conditions above are familiar from Theorems 2.9 and 3.2 (Al) was 
proposed independently of [8j5] in [57] [SU], while there is an antecedent for (A2) in 
the economics literature [91]. The last condition (A4) adapts the convexity required 



by Delanoe, Caffarelli (Theorem 3.4 1 and Urbas, to the geometry of the cost function 
c(x,y); when Mj Q and M~ o are smooth and their convexity is strong — meaning the 
principal curvatures of their boundaries are all strictly positive — we denote it by (A4) s . 
When inequality (A3) or (B3) holds strictly — and hence uniformly on the compact set 
M + x M~ — we denote that fact by (A3) s or (B3) s , respectively. 

Remark 4.6. The quadratic cost c(x,y) = —X-y of Brenier satisfies (B3) but not (A3) s . 
Since we have already seen that (A3) is necessary [5S] as well as sufficient for the 
continuity of optimal maps, the quadratic cost is actually a delicate borderline case. 



The negative — c of any cost c satisfying (A3) s — including those of Examples 4.2-4.3 
— necessarily violates (A3). 
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4.5. Regularity results. Assume (A0)-(A4) and log/* <E C^M*). Under the 
stronger condition (A3) s , Ma, Trudinger and Wang 89 proved the interior regular- 
ity of the optimal map G and corresponding c-convex potential u € C°°(M+ t ); a flaw 
in their argument was later repaired in [134 (see [73 for another approach). Substi- 
tuting strong convexity (A4) s for (A3) s , but retaining (A3), Trudinger and Wang |133j 
used the continuity method to establish regularity up to the boundary u g C°°(M + ). 
Relaxing strong convexity to (A4) in that context is an open problem. 

For densities merely satisfying / + //~ G L°°(M + x M~), under the strong condition 
(A3) s , Loeper was able to establish local Holder continuity of the optimal map — or 
equivalently u £ C Z( ;"(M^ t ) — with explicit Holder exponent a — l/(4n — 1), using a 
direct argument |85j that we sketch out below. This exponent was later improved to 
its sharp value a — l/(2n — 1) by Liu [52]. For the quadratic cost c(x, y) = —x ■ y, the 
best known estimates |56j for the Holder exponent a are much worse, and depend on 
bounds for \og(f + / f~). Assuming non-negative cross-curvature (B3) and (A4) s instead 
of (A3) s , Figalli, Kim and McCann adapted Caffarelli's renormalization techniques [47] 
to derive continuity and injectivity of optimal maps but without any Holder exponent; 
using one of their arguments, a similar conclusion was obtained by Figalli and Loeper 
[5U] in the special case n = 2 assuming only (A3) and (A4) s . Liu, Trudinger and Wang 
showed that higher regularity then follows from further assumptions on f + /f~ in any 
dimension [SB] , 

Using this theory, regularity results have now been obtained in geometries such as the 
round sphere [53], perturbations [35] [S3] [SI], submersions [33] [H] and products [35J 
thereof, and hyperbolic space [5T] [75J. Significant cut-locus issues arise in this context. 
Loeper and Villani 86J conjecture, and in some cases have proved, that condition (A3) s 
on the quadratic cost c(x,y) = d 2 (x,y) actually implies convexity of the domain of 
injectivity of the Riemannian exponential map exp x : T X M — > M. 

4.6. Ruling out discontinuities: Loeper's maximum principle. Let us discuss 
how the condition (A3) rules out the tearing phenomenon which we saw on the saddle 
surface of Example |4.4| 

Discontinuities in the optimal map G{x) = Y(x,Du(x)) correspond to locations 
xq e M + where differentiability of the potential function u — u cc fails, such as locations 
where the supremum 

(34) u(x )= sup -c(x ,y) - u c (y) 

y£M- 

is attained by two or more points yo =/= y\ € M ~ . 




Figure 4. Discontinuous optimal maps arise from distinct supporting hyperplanes. 

The set of such y is denoted by d c u(xo), while the set of such pairs is denoted d°u C 
AI+ x M~. Unless we can find a continuous curve t G [0, 1] i — > yt G d c u(xo) which 
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connects yo to yi, it will be possible [85j to construct probability densities satisfying 
log/* £ C°°(M ± ) nL 00 with a discontinuous optimal map as in £4.2 above. But there 
are not many possibilities to have such a curve. 

In the classical case c(x, y) = —x ■ y (see Figures 4 and 5), d c u = du and we have a 
convex function u with two different supporting planes at xq. 



m!Bt{fo(%),fi(x)} 




hi x ) = ~c(x,y t ) + c(x ,y t ) + u(x Q ) 



Figure 5. Can one c-affine support at Xq be rotated to another, without exceeding ul 

In particular one may continuously rotate the first plane about the point (xq,u(xq)) 
without ever crossing the graph of u until it agrees with the second plane giving a one- 
parameter family of supporting planes to u at the same point. This way one sees that 
du(xo) contains a "segment" {j/t}te(o,i)- 111 this special case yt = (1 — t)yo + ty\ where 
yo and y\ are the slopes of the original supporting hyperplanes. In the general case, the 
corresponding local picture forces the "c-segment" {yt}te(o,l) given by 

(35) D x c{x a , y t ) = (1 - t)D x c(x , y ) + tD x c(x 0l yi) 

to be our only hope for a continuous path connecting yo to y\ in d c u(xo). 

Now, were G to exhibit a discontinuity, this construction suggests G has to transport 
a very small mass around xq into a set with very large mass, which would give a 
contradiction given the constraint G#/i + = fi~~ and the assumptions on /x . This is 
indeed the case, at least under the stronger assumption (A3) s , as we shall see below 



(Proposition 4.12 and Theorem 4.11 ) 



All of this is of course contingent on whether the entire family of functions {gt}t 
lie below u(x), which might not be true for an arbitrary cost c; see Figure 5. Indeed, 
Loeper's key observation is that the Ma-Trudinger-Wang condition (A3) is what guar- 



antees that any family of functions ft(y) = —c(x,y t ) +c(xo,y t ) with y t satisfying (35) 
never goes above u{x). More precisely, it remains below max{/o(a;), f±(x)}. 

Theorem 4.7 (Loeper's maximum principle |5S] [73]). 7/(A0)-(A4) hold and Xq £ M + 



and (y*)te [o.i] C M satisfy (35), then 

(36) f(x, t) := -c(x, y t ) + c(x ,yt) < max{/(z, 0), f(x, 1)} V {x, t) £ M+ x [0, 1]. 

Remark 4.8. ([74]) If in addition, (B3) holds, then t £ [0, 1] — > f(x,t) is convex . 

Loeper's original proof was quite tortuous, relying on global regularity results for 
optimal transportation already established by Trudinger and Wang J133J. Here we 
sketch instead a simple, direct proof due to Kim and McCann [73], who later added 



Remark 4.8 A preliminary lemma gives some insight into the relevance of the cross- 



curvature. 
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Lemma 4.9 (A non-tensorial expression for cross- curvature ;73J). Assuming (A0)-(A4), 
if (^(s))_i< s <i C M + and (y(£))_i<t<i C M~ satisfy either 



dt 2 



D x c{x(0),y(t)) = 



or 



ds 2 



D y c(x(s),y(0))=0, 



then cross (x(0),y(0)) 



d 2 sd 2 t 



s=0=t 



c(x(s),y(t)). 



This lemma is precisely analogous to the formula for the distance between two ar- 
clength parameterized geodesies x(s) and y{t) passing through a;(0) = y(0) in a Rie- 
mannian manifold: 

d 2 {x{s),x{t)) = s 2 + t 2 -2stcos9- ^s 2 t 2 sin 2 9 + 0((s 2 + t 2 f/ 2 ) 



where 9 is the angle between x(0) and y(0) and k is the sectional curvature of the plane 
which they span. Therefore, we will not give its proof. 



Proof of Remark \4-8\ and sketch of Theorem \4 ■ 7| Assume (A3) s for simplicity. It suf- 
fices to prove the following claim. 

Claim 1: if %{x,t Q ) = then ^-{x,t ) > 0. 

Proof of Claim 1: Convexity (A4) allows us to define s £ [0, 1] i — > x(s) by 
(37) D y c(x(s), y(t Q )) = (1 - s)D y c(x , y(t )) + sD y c(x, y(t )) 

and g{s) = -g^-(x(s),to). Our claim is that g(l) > 0. Since f(xo,t) = and hence 
g(0) = 0, to prove Claim 1 it suffices to establish strict convexity in Claim 2. 

Claim 2: g : [0, 1] — > R is convex, and minimized at s = 0. 

Proof of Claim 2: Once g(s) is known to be convex, we need only observe that 

Q3 



/(o) = - 



dsdt 2 



c(x(s),y(t)) 



vanishes by our choice (351 of y(t) = y tl to conclude g(s) is minimized at s = 0. 



Why should g(s) be convex? Note that g"{s) 



a 1 



ds 2 at 2 



t=t 



c(x(s),y(t)) is already 



is thereby estab- 



non-negative according to Lemma 4.9 if we assume (B3). Remark 
lished. Under the weaker condition (A3), we need i I (s)ci J y J (to) = to conclude g(s) 
is convex — and strictly convex if (A3) s holds. But 







at 



(Mo) 



Ct.,j{x{s),y(t ))x l (s)y 3 (t )ds 



and the integrand is constant by our construction (37) of x(s). □ 
To deduce the continuity result of the next section, the following corollary is crucial. 
Corollary 4.10. Assume (A0)-(A4) and fix (x Q ,y ) e M+ t x M~ . If 'u = u cc satisfies 
(38) u(x) > -c(x, y ) + c(xq, y ) + u(x ) 

in a neighbourhood of xq, the same equality holds for all x £ M + . 



Proof. The local inequality (38l implies po := —D x c(xo,yo) & du(xo). If xo S domOu, 
the conclusion is easy. The global inequality 

u(x) > -c(x, yi) + c(x , yi) + u(x ) 
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holds for any (xo, y\) £ d c u, and for y\ £ argmaxj, e ^/- — c(xo,y) — u c (y) in particular. 
The twist condition (Al) then implies j/o = V\- 

Even if xq ^ domDu, taking e.g. p = —D x c(xo,yi) yields 

(39) - c(x, Y(xo,p)) + c(xq, Y(xq,p)) < u(x) — u(x ) Vx £ M + . 



In fact, the set P = {p £ M | (39 1 holds} is convex, according to Theorem 



4.7 



On the 



other hand, P includes all the extreme points p of du(xo), since the preceding argument 
can be applied to a sequence (xk,yk) £ d c uC\ (domDu x M~) with (xfc, Du(xk)) 
(xq,p). Thus P D du(xo), whence po £ P as desired. (In fact, P — du(x )). □ 

4.7. Interior Holder continuity for optimal maps. To conclude our discussion on 
regularity of optimal mappings, let us sketch Loeper's Holder continuity result |85j . 

Theorem 4.11 (Loeper '09). Assume (A0)-(A2)and (A4). (i) If (A3) is violated, there 
exist probability densities with log / £ C°° (M )nL°° and a discontinuous optimal map 
G : Mf nt — > M~ satisfying G#(f + dVol) = f~dVol. (ii) Conversely, if (A3) s holds 

and /+//" € L°°(M+ x M~), then G £ C^T 1 {M+ t , M~). 

In one dimension n = 1, we see G is Lipschitz directly from the equation G'(x) = 
f + (x)/ f~(G(x)). In higher dimensions, this theorem is a direct consequence of the fol- 
lowing proposition, whose inequalities ~ hold up to multiplicative constants depending 
only on the cost c, and in particular on the size of the uniform modulus of positivity in 
condition (A3) s . 

Proposition 4.12 (Sausage into ball [85]). Assuming the hypotheses and notation of 
Theorem \4.11\ ii), take x$,X\ £ M + and set Ax = X\ — Xq and Ay = y\ — y Q where 

yi = G{xi). If |Ax| < |Ay| 5 , there is a ball B t (x) D G^iSs) of radius e - 
centered on the line segment joining Xq to X\, containing the preimage of the "sausage" 

S s = {y£M~\ inf \y-vt\<8} 

{£[1/3,2/3] 

of radius S ~ e|Ay| 2 around the middle third of the curve {yt)te[o.i\ C M~ satisfying 
= £jD x c(x,y t ). 

Proof of Theorem ^.ll\ ii). At pairs of points yi = G(xi) where ||Ax|| > ||Ay|| 5 we 
already have Holder exponent 1/5 — even better than claimed. At other points, using 
the fact that G is a transport map between fjr 1 = f ± dx, the Proposition yields /j,~(Ss) = 
H+iG^iSs)) < ||/ + ||ooe", but also S^Ayl inf /" < fJ,-(S s ). Combining the squares 

M~ 

of these two inequalities, our choices S ~ e|Ay| 2 and e 2 ~ |Ax|/|Ay| yield the desired 
Holder estimate: 

IK/-)- 1 !!^ 2 "- 2 ^! 4 "- 2 < ll/ + HL e 2 "- 2 ^. 

Thus G £ C^T 1 . D 
The proposition relies delicately on Corollary |4. 10| and the correct choice of S and e: 



Proof sketch of Proposition 4-12; c.f. 72\: According to Theorem|2.9| the optimal map 



G(x) — Y(x, Du(x)) is given by a potential u = u cc and Graph(G) C d c u. Thus 
(xi,yi) £ d c u, meaning fi(x) = -c(x,y i ) J rc(xi,y l )+u(x i ) satisfies u(x) > max{/o(x), fi(x)} 
with equality at Xq and X\. Take x to be the point on the segment joining x to X\ 
where f gjx) = fi(x) (= without loss of generality). The semiconvexity of u shown in 
Lemma 3.1 then yields the bound u(x) < |Ax||Ay| + |Ax| 2 . Assumption (A3) s allows 
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Theorem 4.7 to be quantified, so that f t (-) := —c(-, yt) + c(x, yt) + u{x) < «(•) actually 
satisfies 

ft(x')-u(x')<-t(l-t)\x'-x\ 2 \Ay\ 2 

for x' near x. For t £ [1/3, 2/3], these estimates give some leeway to shift yt up to 
distance S without spoiling the inequality g y {x r ) := —c(x',y) + c(x,y) + u{x) < u(x') 
on the boundary x' £ dB e (x). Since g y (x) = u(x), this inequality does not extend to 
the interior of the ball B e (x), unless we subtract some non-negative constant from g y {-)- 
Subtracting the smallest such constant A yields a function g y {-) — A < u(-) on B e (x), 
with equality at some x* £ B e (x). Corollary 4.10 implies (x*,y) £ d c u. For almost 
every such y £ Ss this provides the desired preimage x* € G _1 (y). □ 



5. Multidimensional screening: an application to economic theory 

We now sketch an application 49 of the mathematics we have developed to one of 
the central problems in microeconomic theory: making pricing or policy decisions for 
a monopolist transacting business with a field of anonymous agents whose preferences 
are known only statistically. Economic buzzwords associated with problems of this 
type include "asymmetric information," "mechanism design," "incentive compatibility," 
"nonlinear pricing," "signalling," "screening," and the "principal / agent" framework. 

5.1. Monopolist nonlinear pricing and the principal-agent framework. To de- 
scribe the problem, imagine we are given: a set of "customer" types M + C R™ and 
"product" types M~ c R n and 

b(x,y)= benefit of product y £ M~ to customer x £ M + ; 

a(y)— monopolist's cost to manufacture y £ M~; 

dn + (x) > relative frequency of different customer types on M + . 

Knowing all this data, the Monopolist's problem is to assign a price to each prod- 
uct, for which she will be willing to manufacture that product and sell it to whichever 
agents choose to buy it. Her task is to design the price menu v : M~ -> RU {+00} 
so as to maximize profits. The only constraint that prevents her from raising prices 
arbitrarily high is the existence of a fixed y$ £ M~, called the "outside option" or 
"null product", which she is compelled to sell at cost v(y$i) = a(y^). Though it is not 
necessary, we can fix the cost of the null product to vanish without loss of generality. 

The Agent's problem consists in computing 

(40) u{x) = max b(x,y) — v(y) 

yGM- 

and choosing to buy that product yb lV {x) for which the maximum is attained. The 
monopolist is generally called the principal, while the customers are called agents. 

Economists use this framework to model many different types of transactions, in- 
cluding tax policy [103J (where the government wants to decide a tax structure which 
encourages people both to work and report income), contract theory 126 (where a com- 
pany wants to decide a salary structure which attracts and rewards effective employees 
without overpaying them), and the monopolist nonlinear pricing problem described 
above [106] . In the initial studies, the type spaces were assumed one-dimensional, 
with x £ M + representing the innate ability or talent of the prospective tax-payer or 
employee, and y £ M~ the amount of work that he chooses to do or the credentials 
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he chooses to acquire. The basic insight of Mirrlees and Spence was that under con- 
dition (A2) (which implies (Al) in a single dimension) the variables x, y € R would 
be monotonically correlated by the optimal solution, reducing the monopolist's prob- 
lem to an ordinary differential equation. For this reason the one-dimensional versions 
of (A1)-(A2) are called Spence- Mirrlees (or single- crossing) conditions in the econom- 
ics literature; both Mirrlees and Spence were awarded Nobel prizes for exploring the 
economic implications of their solution. 

Of course, many types of products are more realistically modeled using several pa- 
rameters y € R™ — in the case of cars these might include fuel efficiency, size, comfort, 
safety, reliability, and appearance — while the preferences of customers for such pa- 
rameters are similarly nuanced. Thus it is natural and desirable to want to solve the 
multidimensional version n > 2 of the problem, about which much less is known [S]. 
Monteiro and Page |105j and independently Carlier [53] showed only that enough com- 
pactness remains to conclude that the monopolist's optimal strategy exists. An earlier 
connection to optimal transportation can be discerned in the work of Rochet |117j , who 
proved a version of Theorem |2.3| (Rockafellar) for general utility functions b (= — c in 
our earlier notation). 



Rochet and Chone |118j studied the special case b(x, y) = x ■ y on M ± = [0 



Taking a(y) — \\y\ 2 : dfi + = X[o,i] 2 d 2 x, an d J/0 — (0,0), they deduced that the mapping 
yb,v '■ M + M~ was the gradient of a convex function; it sends a positive fraction of 
the square to the point mass y^, and a positive fraction to the line segment y\ — j/2, 
while the remaining positive fraction gets mapped in a bijective manner, so that 

(41) IT ■= (y fc ,„)#M + = /6A, + fxdU 1 + f^dH 2 . 

They interpreted this solution to mean that while the top end of the market gets cus- 
tomized vehicles f% , price discrimination alone forces those customers in the next mar- 
ket segment to choose from a more limited set /{~ of economy vehicles offering a com- 
promise between attributes y\ and yi- A fraction > of consumers will be priced 
out of the market altogether — which had already been observed by Armstrong [7] to 
be a hallmark of nonlinear pricing in more than one dimension n > 2 . Economists refer 



to this general phenomenon ( 41 ) as "bunching" , and to the fact that /q" > as "the 
desirability of exclusion." 

How robust is this picture? It remains a pressing question to understand whether 
the bunching phenomena of Rochet and Chone is robust, or merely an accident of the 
particular example they explored. As we now explain, their results were obtained by 
reducing the monopolist's problem to the minimization of a Dirichlet energy: 

(42) min / [ i \Du\ 2 - (x, Du{x)) + u{x) ] dH 2 {x). 

0<u convex J^q -^2 \2 J 

The constraint that u : M + — > R be convex makes this problem non-standard: its 
solution satisfies a Poisson type equation only on the set where u is strongly convex 
(D 2 u > 0), and there are free boundaries separating the regions where the different 
constraints u > and D 2 u > begin to bind. 

5.2. Variational formulation using optimal transportation. The principal's prob- 
lem is to choose v : M~ — > R U {+00} to maximize her profits, or equivalently to 
minimize her net losses: 

(43) min / [a(y btV (x)) - v{y btV (x))]d^ + (x). 

l«l«(l/0)=a(V«)} J M+ 

Note that the integrand vanishes for all customers x who choose the null product 
J/6,0 (a:) = 2/0- 
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Wherever the agent's maximum (40 1 is achieved, we have Du{x)—D x b(x, y^ v {x)) = 0, 



so using the twist condition (Al) from our previous lectures we can invert this relation 



to get yt, )V (x) = Y(x, Du(x)). Moreover, the function u(x) from (40) is a 6-convex 
function, called the surplus or indirect utility u = u bb . (In our previous notation, u is a 
(— 6)-convex function and u = u^ b ^^ b \ but we suppress the minus signs hereafter.) 
Since v(Y(x, Du(x)) — b(x,Y(x, Du{x)) — u(x), we may reformulate the variational 



problem (43) as the minimization of the principal's losses 



(44) L(u):= [a(Y(x,Du(x))) -b(x,Y(x,Du(x))) + u(x)]dfj, + (x). 

Jm+ 

over the set Uq, = {u £ U \ u > u^} of 6-convex functions U — {u \ u — u bb } which 
exceed the reservation utility ug(-) = b(., j/g) — a(y$i) associated with the outside option 
or null product. This strange reformulation due to Carlier |22j 

(45) min L{u) 



reduces to (42) in the case considered by Rochet and Chonc. 



5.3. When is this optimization problem convex? From [105 and [22] we know 
that a minimizer exists. The contribution of Figalli, Kim and McCann is to give suf- 
ficient conditions for the variational problem to become convex — in which case it is 
considerably simpler to analyze, theoretically and computationally. It is very interesting 
that the Ma, Trudingcr and Wang criteria for the regularity of optimal mappings turn 
out to be related to this question. The following are among the main results of [4"5] : 

Theorem 5.1 (Convexity of the principal's strategy space [15] )• If b = (— c) satisfies 
(A0)-(A2) and (A4) then the setU = {u = u bb } is convex if and only if (B3) holds, ie., 
if and only z/cross(p, q) > for all tangent vectors {j>,Xq) €E TM + and (q,yo) € TM~ . 

Remark 5.2. It was pointed out subsequently by Brendan Pass |lllj that the convexity 
of M~ assumed in (A4) for each xo E M + is also necessary for convexity of U. 

Sketch of proof. First assume (B3) holds — assuming always (A0)-(A2) and (A4). 
Given uo,u\ G IA and t e [0,1] we claim ut '■= (1 — t)uo + tu\ is 6-convex. This 
can be established by finding for each xq £ M + a y t £ M~ such that 

(46) ut(-) > b(-,y t ) - b(x ,y t ) + u t (x ) throughout M + , 

for then u t (-) is the supremum of such functions. Corresponding to t = 0, 1 the desired 
points yo,yi & M~ exist, by 6-convexity of Ui = u bb for i = 0,1. By (A4), we can 
solve the equation D x b(xo,yt) = (1 — t)D x b(xo, yo) + tD x b(xo, yt); the solution y t £ M~ 



makes /(•, t) :— b{-, yt)—b(xo, yt) a convex function of t G [0, 1], according to Remark 4. 
Inequality ( |46[ ) holds at the endpoints t = 0, 1; taking a convex combination yields the 
desired inequality for intermediate values of t € [0,1]. For the converse direction, we 
refer to g5]. □ 

Theorem 5.3 (Convexity of principal's losses and uniqueness of optimal strategy |49j ) ■ 
// (A0)-(A4) and (B3) hold and if a = a bb , then the functional u EU i — > L(u) defined 
by ( |44[ ) is convex. Furthermore, it has enough strict convexity to conclude the optimizer 
u € U§ is uniquely determined (at least fi + -a.e.) if fi + <^ H n and either (i) y^^ : M + — > 
M _ is continuous or else (ii) b has positive cross-curvature (B3) s . 

Sketch of proof. To deduce convexity of L : IA — > R, recall 6-convexity of a implies 

a(Y(x,p)) - b(x,Y(x,p)) = sup b{x ll Y(x,p)) - b(x,Y{x,p)) - a b (x 1 ). 

Xl eM+ 
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For each x £ M + fixed, the functions under the supremum are convex with respect 



to p G M~ , according to Remark 4.8 Thus the integrand in ( |44| is linear in u(x) 
and convex with respect to p = Du(x), which establishes the desired convexity of the 
integral L(u). In case (i) the integrand is strictly convex, while in case (ii) it is strictly 
convex for all x G dom Da b , which is a set of full ^ + <C H n measure. We refer to [49] 
for details. □ 



Regarding robustness: we may mention that, as in Remark |4.6| the bilinear function 
b(x, y) lies on the borderline of costs which satisfy (B3). Thus there will be perturbations 
of this function which destroy convexity of the problem, and we can anticipate that 
under such perturbations, uniqueness and other properties of its solution may no longer 
persist. In fact, for a = and b(x, y) = —d 2 M (x, y) on a Riemannian ball M + = M~ = 
B r (y$), we arrive at a problem equivalent to a fourfold symmetrized version of Rochet 
and Chone's in the Euclidean case, but which satisfies or violates (B3) depending on 
whether the metric is spherical or hyperbolic. This can used to model local delivery of 
a centralized resource for a town in the mountains |4"9"] : cf. Examples 4.3| 4.4 On the 



other hand, under the hypotheses of Theorem |5.3| we are able to show that Armstrong's 
"desirability of exclusion" [7] continues to hold. We give the statement only and refer 
to [49 for its proof. 

Theorem 5.4 (The desirability of exclusion). Assume (A0)-(A4), (B3), a = a bb and 
that dfi+ = f+dH n with f+ £ W 1 ' 1 (M+) and the convex set M~ = D y b(M+ ,y®) C R n 
has no (n — 1) dimensional facets. Then a positive fraction of agents will be priced out 
of the market by the principal's optimal strategy. 

Remark 5.5. It is interesting to note that the strict convexity condition on holds 
neither in one dimension — where Armstrong noted counterexamples to the desirability 
of exclusion — nor for the example of Rochet-Chone, where convexity of is not 
strict. 

5.4. Variant: maximizing social welfare. Idealistic readers may be taken somewhat 
aback by the model just presented, for it is the very theory which predicts, among other 
things, just how uncomfortable airlines ought to make their economy seating to ensure 
- without sacrificing too much economy-class revenue — that passengers with the 
means to secure a business-class ticket have sufficient incentive to do so. Such readers 
will doubtless be glad to know that the same mathematics is equally relevant to the 
more egalitarian question of how to price public services so as to maximize societal 
benefit. 

For example, suppose the welfare w{x, u{x)) of agent a; is a concave function of the 
indirect utility u(x) he receives. A public service provider would like to set a price menu 
for which u = v b maximizes the toral welfare among all agents: 

max / w{x 1 u(x))d/j + (x), 
ueUm, L(u)<o J M + 

subject to the constraint L(u) < that the service provider not sustain losses. Intro- 
ducing a Lagrange multiplier A for this budget constraint, the problem can be rewritten 
in the unconstrained form 

max— XL(u) + / w(x, u(x))d[i + (x), 
ueUt, J M + 

(for a suitable A > 0). Under the same assumptions (A0)~(A4), (B3) and a = a bb as 
before, we see from the results above that this becomes a concave maximization problem 
for which existence and uniqueness of solution follow directly, and which is therefore 
quite amenable to further study, both theoretical and computational. 
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6. A PSEUDO-RlEMANNIAN AND SYMPLECTIC GEOMETRIC AFTERWORD 

The conditions of Ma, Trudinger and Wang for regularity of optimal transport have 
a differential geometric significance uncovered by Kim and McCann |73j . which led to 
their discovery with Warren |75j of a surprising connection of optimal transport to the 
theory of volume-maximizing special Lagrangian submanifolds in split geometries. 

Indeed, since the smoothness of optimal maps G : M + — > M~ is a question whose 
answer is independent of coordinates chosen on M + and M~ , it follows that the nec- 



essary and sufficient condition (A3) for continuity in Theorem 4.11 should have a ge- 
ometrically invariant description. We give this description, below, as the positivity 
of certain sectional curvatures of a metric tensor h induced on the product manifold 
N := M + x M~ by the cost function c E C 4 (N). This motivates the appellation 
cross- curvature. The rationale for such a description to exist is quite analogous to that 
underlying general relativity, Einstein's theory of gravity, which can be expressed in the 
language of pseudo-Riemannian geometry due to the coordinate invariance that results 
from the equivalence principle (observer independence). 

Use the cost function to define the symmetric and antisymmetric tensors 

" ™ d 2 c 

(47) ^ = E E ( dxi ® d y J + d v 3 ® dxt ) 

n n d 2 c 

(48) ^EEfliiS^*®^-^®^) 

i=i J= i y 

on N = M + x M~ . Then condition ( A2) is equivalent to non-degeneracy of the metric 
tensor h, which in turn is equivalent to the assertion that a; is a symplectic form. Note 
however that h is not positive-definite, but has equal numbers of positive and negative 
eigenvalues in any chosen coordinates, ie. signature (n, n). Conditions (A3) and (A4) are 
conveniently re-expressed in terms of the pseudo-metric h, and its pseudo-Riemannian 
curvature tensor iZyjw [73]. Indeed, condition (A4) asserts the /i-geodesic convexity of 
{x } x M~ and M + x {yo}, while the formula 

cross(p, q) = R ijk ip l q> p k q l 

shows the cross-curvature is simply proportional to the pseudo-Riemannian sectional 
curvature of the 2-plane (p© 0) A (0 © q). The restriction distinguishing (A3) from (B3) 
is that p © q be lightlike, which is equivalent to the /i-orthogonality of p © and © q. 

Kim and McCann went on to point out that the graph of any c-optimal map is 
/i-spacelike and w-Lagrangian, meaning any tangent vectors P,Q € T^ xG ^ x ^N to this 
graph satisfy h(P,P) > and uj(P,Q) = 0. This is a consequence of Corollary 2.7 it 
is also very illustrative to check it directly by hand in the case of the quadratic cost in 
E™. When (A0)-(A4) hold they also showed the converse to be true: any diffeomor- 
phism whose graph is /i-spacelike and cj-Lagrangian is also c-cyclically monotone, hence 
optimal. With Warren |75j . they introduced a pseudo-metric 

(49) ht _f f+( X )f-(y)^ 1/n 



I det a ij {x,y)\ i 

conformally equivalent to h. In this new metric, they show the graph of the c-optimal 
map pushing dfj, + (x) = f + (x)d n x forward to f^~(y) = f {y)d n y has maximal volume 
with respect to compactly supported perturbations. In particular, Graph(G) has zero 
mean-curvature as a submanifold (with half the dimension) of (N, h[ ) — yielding an 
unexpected connection of optimal transportation to more classical problems in geometry 



and geometric measure theory. (Note that the metric ( 49 ) depends only on the measures 
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^ and the sign of the mixed partial D 2 xy c in dimension n = 1.) The preprint of Harvey 
and Lawson |65j contains a wealth of related information concerning special Lagrangian 
submanifolds in pseudo-Riemannian (= semi-Riemannian) geometry. 
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